Idiosyncratic fixation patterns generalize across dynamic and static facial expression recognition

Facial expression recognition (FER) is crucial for understanding the emotional state of others during human social interactions. It has been assumed that humans share universal visual sampling strategies to achieve this task. However, recent studies in face identification have revealed striking idiosyncratic fixation patterns, questioning the universality of face processing. More importantly, very little is known about whether such idiosyncrasies extend to the biological relevant recognition of static and dynamic facial expressions of emotion (FEEs). To clarify this issue, we tracked observers’ eye movements categorizing static and ecologically valid dynamic faces displaying the six basic FEEs, all normalized for time presentation (1 s), contrast and global luminance across exposure time. We then used robust data-driven analyses combining statistical fixation maps with hidden Markov Models to explore eye-movements across FEEs and stimulus modalities. Our data revealed three spatially and temporally distinct equally occurring face scanning strategies during FER. Crucially, such visual sampling strategies were mostly comparably effective in FER and highly consistent across FEEs and modalities. Our findings show that spatiotemporal idiosyncratic gaze strategies also occur for the biologically relevant recognition of FEEs, further questioning the universality of FER and, more generally, face processing.

were more likely generated by the representative HMM of Group 1 rather than by those of Group 2 and 3 (F(2, 318) = 95.71,p < 0.001).The same pattern of results were obtained for data from Group 2 (F(2, 276) = 166.97,p < 0.001) and from Group 3 (F(2, 279) = 319.16,p < 0.001).A 3-sample test for equality of proportions without continuity correction revealed that Group 1 (319 participants), Group 2 (277 participants) and Group 3 (280 participants) were statistically comparable in size (χ 2 (2) = 5.64, p = 0.06).However, sROIs 1 and 3 and 2 and 3, in Group 1 and 2 respectively, are duplicate of each other's (Supplementary Fig. 1).Therefore, their transition probabilities can be collapsed together.The final number of sROIs was nonetheless kept to three to fully account for the data of Group 3. The final three representative fixation patterns are illustrated in Figs. 1 and 2. Group 1's first fixation was located within an area encompassing the mouth, left eye, and the internal corner of the right eye (after collapsing red and blue sROIs, 100% of cumulative starting probability -SP).Following fixations were then directed towards the region spanning between the mouth, nose, and right eye (green sROI, 100% transition probability -TP).Comparatively, the majority of Group 2's first fixations were directed more towards the mouth, and only partially towards the left eye (red sROI, 96% SP).Subsequent fixations were then restricted within a central region of the face (after collapsing green and blue sROIs, 100% cumulative TP) spanning predominantly from the nasion to the mouth.The third Group discovered using EMHMM exhibits a more focal pattern.After directing the majority of their first fixation towards the left eye (red sROI, 86% SP), fixations shifted with similar probability towards either the midline of the face (blue sROI, 49% TP) or remained focused around the eyes (green sROI, 51% TP).In both cases, the next most likely fixation location was within the eye region (from midline to eye-region, 64% TP; no shift from the eye-region, 70% TP), while only a smaller proportion of fixations were redirected towards the face midline (from eye-region to midline, 30% TP; or no shift from midline, 36% TP).

Spatial comparisons of fixation patterns with iMap4
To better disentangle the differences between the three representative fixation patterns discovered using EMHMM, we explored their spatial distributions using iMap4.iMap4 is a data-driven method that assesses statistical differences in terms of fixation distributions only, without considering their temporal relationship.Our linear mixed model revealed significant differences in the fixation location across the three Groups (Fig. 2).Specifically, the comparison of Group 1 and 2 revealed two significant clusters.The first cluster was characterized by a greater number of fixations towards the eye-region by Group 1 (F(1,128) = 63.24

Generalization of sampling strategies across FEEs and modalities
EMHMM revealed three representative visual sampling strategies during FER, relating to specific fixation patterns.Each eye-movement dataset (73 subjects × 6 emotions × 2 modalities) was classified by EMHMM as belonging to one of these visual sampling patterns.In this section, we explored whether the clustering of these datasets was influenced by FEE or modality.In other words, for each subject, we determined whether different FEEs and stimulus modalities would trigger comparable or different sampling strategies.When considering the distribution of EM datasets for each observer across the three groups (Fig. 3), we found 28 out of 73 subjects (39%) to be "consistent observers".Specifically, the eye-movement patterns exhibited across the 12 different conditions were all clustered within the same group, although different subject could belong to different groups.The proportion of consistent subjects was significantly higher compared to the proportion of observers consistent in only 6, 7, 8, 9, 10 or 11 conditions (χ 2 (1) = 20.63,p < 0.001; χ 2 (1) = 9.33, p < 0.01; χ 2 (1) = 11.33,p < 0.001; χ 2 (1) = 13.60,p < 0.001; χ 2 (1) = 17.55, p < 0.001; χ 2 (1) = 17.55, p < 0.001).Importantly, consistent participants were equally present across the three groups (χ 2 (2) = 1.79, p = 0.41).To account for the large number of conditions and therefore for the increased probability of finding one or two conditions clustered in a separate group by chance, we redefined the concept of a "consistent observer" to include those participants who showed a stable fixation strategy across at least 10 conditions.Forty-three subjects (60%) fit this definition and the subsequent comparison with the number of participants not meeting this criterion (30 individuals, 40%) showed a significant difference (χ 2 (1) = 4.99, p < 0.05).In the next section, we explore in more details the presence of sampling strategy's (in) dependency on FEE and modality.

Eye-movement patterns across modalities
We first explored the impact of modality on eye-movements patterns for each FEE separately.Results showed that during recognition of anger, fear, sadness and surprise, the eye-movement patterns for static and dynamic stimuli were clustered within the same group for 86%, 82%, 86% and 88% of observers respectively (Fig. 4a).Sampling strategies were stable across modalities also for the recognition of disgust and happiness, although for a smaller proportion of subjects (74% and 78% respectively).These proportion differences between the 6 emotions were not significant (χ 2 (5) = 7.39, p = 0.19).Further examination revealed that 44% of the observers had stable fixation strategies across modalities for all emotions (Fig. 4b).It is however important to note that the emotions could differ in the eye-movement patterns they triggered and be therefore clustered in different groups (e.g., the EM patterns of participant X for anger in both static and dynamic conditions might be clustered within Group 1, while for the same participant the EM patterns for disgust, both static and dynamic, might be clustered within Group 2).The sampling strategies of the remaining observers (56%) were impacted by presentation modality to different degrees and depending on the FEE considered.Specifically, 26% of observers exhibited fixation patterns that were modality-independent for 5 emotions, 19% for 4 emotions, 5% for 3 emotions, 4% for 2 emotions and 1% for 1 emotion only (Fig. 4b).Differences in these proportions were statistically significant (χ 2 (5) = 75.05,p < 0.001).More precisely, the observers with modality-independent sampling strategies for all FEEs were significantly more than those who were stable for only 3, 2 or 1 emotion.This was also the case for observers who were stable for 5 FEEs compared to those who were stable for 1, 3 (p < 0.001) and 2 (p < 0.05) FEEs.The proportion of stable observers for 3 FEEs was significantly greater than the proportion of stable individuals for 2 (p < 0.05) or 1 (p < 0.001) FEE.Finally, a significantly higher proportion of observers were stable for 2 FEEs compared to those who were stable for only 1 FEE (p < 0.05).Importantly, all pairwise comparisons were corrected for multiple comparisons using the Holm-Bonferroni procedure.

Eye-movement patterns across expressions
In this section, we explored whether a given facial expression of emotion would trigger a specific sampling strategy (Fig. 5).Comparing the presence of EM datasets associated to specific FEE across groups revealed that the sampling strategies deployed during the recognition of dynamic fear and static and dynamic happiness were more often clustered within Group 2 (dynamic fear: χ 2 (2) = 8.07, p < 0.01; static happiness: χ 2 (2) = 8.00, p < 0.05; dynamic happiness: χ 2 (2) = 18.61, p < 0.001).Specifically, sampling strategies associated with dynamic fear were significantly more present in Group 2 compared to Group 3 (p < 0.05), while the difference with Group 1 was not significant.Eye-movement pattern triggered by static happiness were significantly more present in Group 2 than Group 1 (p < 0.05) but not Group 3. Finally, dynamic happiness triggered sampling strategies that were more often clustered in Group 2 than Group 1 or 3 (p < 0.001).No other combination of FEE and modality of presentation showed a clear association with a specific group of eye-movement patterns.Finally, exploring the composition of each group, separately for each modality, revealed that the EM datasets associated to each emotion were equally frequent within Group 1(static: χ 2 (5) = 6.63, p = 0.24; dynamic: χ 2 (5) = 7.22, p = 0.20) and within Group 2 (static: χ 2 (5) = 3.32, p = 0.65; dynamic: χ 2 (5) = 6.41, p = 0.26).In contrast, within Group 3 we found a significantly different frequency of datasets associated with different emotions in both modalities (static: χ 2 (5) = 12.54, p < 0.05; dynamic: χ 2 (5) = 21.79,p < 0.001).Specifically, within the dynamic modality, EM datasets related to the recognition of happiness were significantly more present than those related to the recognition of anger (p < 0.05) and of fear (p < 0.01).Within the static modality, no contrasts were significant.Importantly, all pairwise comparisons were corrected using the Holm-Bonferroni procedure.

Idiosyncratic fixation patterns and performance during FER
In this final section, we investigated whether the three representative sampling strategies identified using EMHMM had any impact on facial expression recognition performance.Given the differences in our sample sizes and potential different variance across groups we carried out our analysis using the non-parametric Mann Whitney U-test.After retrieving the recognition accuracy related to each eye-movement dataset, we compared performance across groups for each emotion and modality independently.Applying Bonferroni correction for multiple comparisons, we only found a significant result for the recognition of the dynamic expression of anger (Fig. 6).Specifically, the recognition accuracy for the dynamic expression of anger was significantly higher in Group 2 compared to Group 1 (W(1) = 129,5, p < 0.01, 95% CI[-0.169,-0.037]).All other statistical values are included in Table 1.

Discussion
Our overarching goal was to investigate whether the eye movements exhibited during FER could be categorized into a unique or distinct groups of fixation patterns, isolate potential differences in sampling strategies, and assess the consistency of the observers' eye-movements across stimulus modality (i.e., static vs. dynamic).We identified three distinct, equally effective, and mutually exclusive idiosyncratic visual sampling strategies for static and dynamic FER in Western observers.The first strategy involved vertical eye fixations alternating between two regions: one included the left eye, the other the right eye, and both included the mouth.The second strategy was characterized by eye movements that alternated less between individual face features and were more bound within the central region of the face.This included the inner corner of the eyes, the nose, and the whole mouth region.Finally, the third strategy markedly differed from the previous two from a temporal and a spatial point of view.Within this group, fixations alternated between the eye and a vertical area encompassing the nose and mouth regions.All three strategies were highly consistent across all six basic FEEs, modalities and, importantly, did not modulate FER performance.It is worth noting that these visual sampling strategies were identified by considering the fixation patterns resulting from 12 eye-movement (EM) datasets (observations for the 6 FEEs × 2 stimulus modalities), rather than categorizing observers by their sampling strategy averaged across 12 conditions.Our results showed that these visual fixation patterns were statistically stable.Most of the observers (60%) either predominantly favored one strategy among the three identified or adopted a non-systematic alternative strategy for only one or two conditions of the experimental task.Altogether, our findings refine previous research in emotion processing 21,33 by showing spatiotemporal idiosyncratic differences in FER, strongly challenging the traditional view of a single face processing format for the decoding of FEE.Importantly, effective FER can be achieved by sampling different combinations of multiple facial features.
Yitzhak and colleagues 21 previously addressed a similar question by using analyses on single predetermined face-feature (i.e., eye, nose, and mouth lookers), while excluding fixation transition across face regions.Additionally, their study was limited to the use of non-ecologically valid dynamic FEEs.To overcome these limitations, we used static and dynamic FEEs and combined two robust well-established data-driven eye-movement statistical approaches previously validated in other face perception studies [34][35][36][37] : the EMHMM 22 and iMap4 23 toolboxes.As a result, our data-driven statistical approaches specifically isolated the idiosyncratic spatiotemporal information of the fixations dedicated to the decoding of static and dynamic FEEs.More precisely, Group 1 displayed scan paths starting from two overlapping vertical left-central statistical Regions of Interest (sROIs) encircling the left eye and mouth.All fixations were originating from this region and were ending in a specular vertical right-central www.nature.com/scientificreports/sROI, this time encompassing the right eye.Scan paths in Group 2 exhibited a comparable vertical arrangement of eye movements, with fixations predominantly starting at a vertical left-central sROI before descending to the face midline encompassing nose and mouth regions yet including just the inner corner of the eyes.Group 2 displayed less fixation transitions across face regions, yet with a significantly higher number of fixations to the mouth region, when compared with the other two groups (as highlighted with iMap4 in Fig. 2).Lastly, Group 3 exhibited a more focal pattern of fixations, with marked eye-movement differences in both spatial and temporal dimensions.Fixations initiated at one small round sROI encircling the right eye, and then shifted with similar probability towards either the midline of the face or remained focused on the eyes.The next most likely transition in both cases was towards the eye region.Finally, a smaller proportion of fixations were redirected towards the face midline.Our next objective was to assess whether idiosyncratic scanning patterns generalize across the 6 FEEs and the 2 stimulus modalities (12 conditions).We started by assessing how the 12 EM datasets of each observer distributed across the three identified face scanning strategies (Group 1, 2 or 3).This procedure allowed us to evaluate how consistent the observers were in their fixation strategy.This analysis revealed that 39% of observers consistently adhered to one single scanning strategy and had all twelve datasets of static and dynamic FEE clustering within one group.However, expecting observers to be consistent over 12 different conditions is a strict requirement, which overlooks the probability of random inconsistencies.Hence, adjusting the threshold to 10 stable conditions revealed that more than half of our subjects (60%) exhibited consistent sampling strategies.This stable use of a single fixation pattern across facial expressions and stimulus modality at the individual level confirms and extends previous similar findings, which were however only reported at the group level 12,20,24,38 .www.nature.com/scientificreports/Moreover, we noted that observers were equally distributed across the three groups, suggesting no direct link between a specific pattern of fixation and the frequency of its usage.Following up on these results, we then questioned whether any observed shift in strategy was driven by any specific FEE or stimulus modality (static vs. dynamic).Concerning stimulus modality, our data confirmed the consistent use of a single scanning strategy across both static and dynamic stimuli for most observers, suggesting that stimulus modality does not significantly impact visual strategies during FER.This contradicts previous findings reporting stable central fixations for dynamic FER compared to distributed fixation for static 24 .Potential explanations for this discrepancy might lie in some methodological differences between the two studies.First, in our study, we used face stimuli subtending a larger visual angle (14°) to elicit fixations on distinct facial features and better approximate the natural size of faces encountered during real-life social interactions.In contrast, the dynamic FEE used by Blais and colleagues 24 subtended a smaller visual angle (5.72°).This might have allowed observers to sample most of the relevant facial information by only fixating the center of the faces, effectively reducing the need to visually explore the stimuli.Conversely, the larger visual angle used in the present study required observers to perform more fixations across the whole face in order to gather the visual information necessary for FER.Secondly, in the study conducted by Blais and colleagues 24 stimulus duration was set to 500 ms.Under these time constraints, fixating towards the center of the face might have been more efficient for gathering information as quickly as possible, rather than tracking the different moving parts.Such a short presentation time limited the number of fixations, reducing the possibility of observers to fully explore the face stimuli.In contrast, we presented face stimuli for 1 s, closer to the duration of the natural unfolding of dynamic facial expressions.This longer and more ecological stimulus duration might have allowed our participants to sample visual information more closely as they were evolving over time.Taken together, our methodological choices revealed that observer's idiosyncratic fixation patterns generalize across static and dynamic facial expressions, reflecting consistent scanning strategies independent of modality.
Concerning the impact of FEEs on scanning strategies, our data revealed that the strategy of Group 1 was significantly more frequently used to recognize "dynamic fear", while Group 2's sampling strategy was more often used to recognize "happiness".These two observations suggest a potential link between particular emotions (happiness and dynamic fear) and the way observers allocate their attention to specific facial features during FER [38][39][40][41][42] .This idea is further supported by the differences in fixation locations between the two groups revealed by iMap4.Specifically, we found that Group 1 is characterized by a greater utilization of the eye region, which has been shown to elicit fixations to this region more frequently during the decoding of fear 39 .Additionally, allocating fixations to the eyes during the decoding of this FEE has been associated with enhanced classification accuracy 42,43 .On the other hand, the sampling strategy found in Group 2 focuses comparatively more on the mouth.Similarly, this is a diagnostic region that is sufficient, but also necessary, for the recognition of happiness 40,41 .No other FEE was statistically associated with any specific sampling pattern.This might suggest that for the decoding of the remaining FEEs, observers can efficiently gather relevant information by mainly using the same sampling strategy.Taken together, our findings support that each individual develops the most effective strategy that will benefit them in most situations, bypassing the need to constantly adapt to the presented stimulus.
Finally, we examined whether a specific sampling strategy was more effective than the others for FER performance.Our data revealed comparable scores of FER performance across the different groups for most FEE, except for the decoding of the dynamic expression of anger by observers in Group 2, who performed significantly better than those in Group 1. Fixation maps obtained for both groups suggest that observers who relied more on the mouth (Group 2) performed more accurately, which is in line with previous findings 21,42,43 .

Conclusions
To conclude, our data revealed three distinct idiosyncratic visual strategies during FER by quantitatively measuring both spatial and temporal dimensions of eye fixations in a large group of healthy young adults.Those strategies are as effective to achieve FER and highly generalize across all six basic expressions and the static and dynamic modalities.These observations were established by using strong methodological approaches relying on robust data-driven analyses, coupled with ecologically valid static and dynamic FEE stimuli matching the visual angle and temporal duration of real-life interactions.The visual system information intake is not universal even for the biologically relevant recognition of FEEs.Individual differences are present in diverse face processing tasks and future research is necessary to clarify the cognitive and neurofunctional roots of these observations.

Participants
When assessing individual differences in sampling strategies during scene perception, Hsiao et al. (2021)  observed a large effect size with 60 participants.To be precautious, we increased this number to 70.In the end, a total of 73 Western adult observers (12 males, age range 18-30 years, M = 21.53,SD = 3.03) recruited at the University of Fribourg (Switzerland) participated in this experiment.All participants had normal or correctedto-normal vision with no history of neurological or psychiatric disorders.They were recruited at the University of Fribourg and received course credits for their participation.The study was approved by the ethical committee of the University of Fribourg.All experiments were performed in accordance with relevant guidelines and regulations.Informed consent was obtained from all participants before starting.

Stimuli and procedure
A total of 48 stimuli created by Gold and colleagues 28 were used.The stimuli consisted of 4 female and 4 male identities, each portraying the six basic facial expressions of emotions (FEEs): anger, disgust, fear, happiness, sadness, and surprise (Ekman 1 ).The stimuli consisted of either dynamic or static versions of each expression.
The dynamic stimuli evolved from a neutral to a fully articulated expression over the course of 30 frames, while the static stimuli showed 30 repetitions of the last frame of each dynamic sequence corresponding to the apex of each expression (Fig. 7).The stimuli were normalized for their contrast, luminance and amount of energy transmitted over presentation time using the SHINE Toolbox 44 .Finally, we added visual noise to each frame of the static sequences to match the luminance and contrast that were present in each frame of the dynamic stimuli.Both procedures are detailed in a recent work by Richoz and colleagues 45 .Stimuli subtended 13.84° height × 11.02° width of visual angle at a distance of 70 cm from the screen and were shown on a VIEWPixx/3D screen with a resolution of 1920 × 1080 pixels.
The experiment was carried out using the Psychophysics 46,47 and the EyeLink 48 Toolboxes running on Matlab (R2014b, The MathWorks, Natick, MA).The oculomotor behavior of observers was recorded by tracking their left eye using an EyeLink 1000 Desktop Mount with a sampling rate of 1000 Hz.A nine-point calibration procedure was implemented before each testing session and repeated every 48 trials to ensure accurate gaze tracking.Each trial started with a fixation cross displayed at the center of the screen and participants were required to fixate it until it disappeared.This procedure was used to ensure the precision of calibration.Observers performed a total of 576 trials composed by 96 unique trials (6 FEEs × 8 identities × 2 modalities), each one repeated 6 times.On each trial, the face stimulus appeared in one of six randomized locations on the screen to reduce anticipatory strategies and ensure that the location of the first fixation was self-determined by the observer.Stimuli were presented for 1 s at a frequency rate of 30 Hz.This time constraint was used to prevent observers from adopting random sampling strategies after facial expression recognition and ensure a higher ecological validity of the recorded eye-movement data.Participants were instructed to freely explore each face and judge the facial expression portrayed.Following a 1-s presentation of the face stimulus, a list of all six expressions appeared on the screen.Participants used corresponding keyboard keys (i.e., dedicated keys letters for each expression) to select the perceived expression from this list (Fig. 8).The response window remained on the screen until a selection was made.Note that participants had also the option to choose a key labeled "I don't know" if they did not have enough time to see a given expression or for unknown answers.A schematic representation of the procedure.Each trial started with a central fixation cross followed by a facial expression presented for 1 s at a random location on the screen (e.g., top left).After each trial, participants provided their answer using labeled keys on a keyboard.The answer screen in French reads as follows: "press p for fear, c for anger, d for disgust, j for happiness, t for sadness, s for surprise, and I for 'I don't know'".www.nature.com/scientificreports/

Statistical analysis
Preprocessing for data analysis We applied the adaptive velocity algorithm developed by Nyström and Holmqvist 49 to extract fixations.These were then realigned to a normalized space using iTemplate toolbox 50 .Finally, we filtered the data based on trial accuracy as further analysis will focus on fixations performed during correct trials only.

EMHMM
To explore eye movements data and quantitatively evaluate differences and similarities among individuals, we used the EMHMM toolbox 51 which employs hidden Markov models (HMM).HMM capture in a compact fashion both spatial and temporal components of gaze behavior, proving to be particularly useful to analyze scan path on faces [52][53][54][55] .We started by providing the EMHMM algorithm with a total of 876 eye-movement (EM) datasets (73 participants × 12 conditions; Supplementary Fig. 2).Then, we used a variational Bayesian expectation maximization algorithm to estimate one HMM for each one of the 876 EM datasets, by initializing the seed random number generator with the 1000 value.To obtain each model, the algorithm determined the optimal number of hidden states within a predefined range from 1 to 3. In this context, hidden states correspond to statistical regions of interest (sROIs) on the face stimulus and the optimal number of sROI corresponds to the solution that maximizes the log-likelihood of each model.These sROI are represented by ellipses in HMM models.Subsequently, we used a variational hierarchical expectation maximization algorithm to cluster individual models together, by initializing the seed random number generator with the 1001 value.In this case, we used a fixed predefined number of hidden states corresponding to the median number of ROIs observed across the 876 models.To determine how many significantly different groups existed within our dataset, we adopted the following approach.We started by clustering individual models into 2 groups, which were then compared statistically.If they resulted significantly different, we increased the number of groups and repeated the comparison.This cycle was iterated until the resulting groups were no longer different.Finally, we determined, for each of the 876 EM to which group they were allocated.Using this information, we then explored scanning pattern consistency between modalities, FEEs and within subjects.For example, if the fixation patterns of subject A elicited while viewing static and dynamic FEEs were assigned to the same cluster, we could infer that the modality of presentation does not have a significant modulation effect on sampling strategies.

Fixation map analysis
We used the iMap4 23 toolbox to statistically assess differences between the groups previously obtained using the EMHMM toolbox.Specifically, here we aimed to explore the representative fixation patterns obtained using EMHMM in terms of spatial information only.EMHMM considers number of fixations for spatial analyses, therefore we used the same measure when comparing the clusters in this section.iMap4 uses data-driven methods to assess statistical differences in terms of fixation distribution, without taking into consideration their temporal relationship.Fixation maps were smoothed using a two-dimensional Gaussian kernel function at 1° of visual angle by selecting the estimated option.This method consists in computing for each condition and observer, the expected values across trials.Finally, we normalized the maps by dividing them by the number of fixations performed in each trial.A pixel-wise linear mixed model was then applied on the smoothed normalized fixation maps and a multiple comparison correction was conducted by using a bootstrap spatial clustering method to control for type 1 errors.Fixation map analysis subsequently comparing the varying identified groups together involved number of fixations as response variable and participants as random predictors to account for the dependency.

Behavioral analysis
We used the unequal variances Mann-Whitney U-test to examine accuracy variations between groups while considering both expression and modality as factors.This involved comparing performance for one expression in one modality (e.g., Dynamic Anger) at a time between the three groups (3 comparisons each time).We corrected for multiple comparisons by dividing the significance level α = 0.05 by 3.All models were fitted in R 4.2.2 56 .

Figure 1 .
Figure 1.Fixation patterns (n = 3) in FER discovered through EMHMM clustering.Each representative HMM included three different states (k = 3), depicted by the sROI 1 (red), 2 (green) and 3 (blue).Please note that, as sROI 1 and 3 in Group 1, and sROI 2 and 3 in Group 2, were duplicates, an ellipse displacement of 1 pixel to the right of the figure was made for better visualization.Third row shows priors and transitions matrices.Priors represent the probability of the first fixation to belong to each state.Gaze transition probabilities between the three different states indicate the probabilities of observing a particular transition from one state to another, or to remain in the same state.

Figure 2 .
Figure 2. Heat maps illustrating the fixation bias of Group 1, 2 and 3, with their associated statistical difference.Significant areas are demarked by a black line.Yellow and blue clusters represent the respective groups' differences.

Figure 3 .
Figure3.Distribution of 72 participants to their assigned prevalent groups of fixation strategy and their general level of consistency across the twelve eye-movement datasets.One subject was not included as their EM datasets were split over different groups without any prevailing over the others.Please note that 60% of the participants ("consistent observers") use the same strategy for at least 10 out of 12 conditions.

Figure 4 .
Figure 4. (a) Percentages of observers employing the same strategy (1, 2 or 3) for static and dynamic modalities for each expression.(b) Distribution of observers employing the same strategy (1, 2 or 3) for static and dynamic modalities, from 1 to 6 FEEs.

Figure 5 .
Figure5.Distribution of EM datasets for all expressions within groups, respectively in static and dynamic modalities.Note that the distribution of expressions between groups is not represented here.

Figure 6 .
Figure 6.Observers' FER accuracy in each of the twelve conditions across the three strategies.* p < 0.0017.Error bars represent 95% confidence intervals of the median number of correct responses for each group and condition.

Figure 7 .
Figure 7. Illustration of the six static facial expressions of emotion for one female identity.

Figure 8 .
Figure8.A schematic representation of the procedure.Each trial started with a central fixation cross followed by a facial expression presented for 1 s at a random location on the screen (e.g., top left).After each trial, participants provided their answer using labeled keys on a keyboard.The answer screen in French reads as follows: "press p for fear, c for anger, d for disgust, j for happiness, t for sadness, s for surprise, and I for 'I don't know'".

Table 1 .
Group accuracy in FER Group accuracy in FER.